Skip to content

Dispatch create-path notifications async to fix slow POST latency#14731

Merged
Maffooch merged 4 commits intobugfixfrom
fix/async-notification-dispatch
Apr 24, 2026
Merged

Dispatch create-path notifications async to fix slow POST latency#14731
Maffooch merged 4 commits intobugfixfrom
fix/async-notification-dispatch

Conversation

@Maffooch
Copy link
Copy Markdown
Contributor

Summary

  • Adds async_create_notification Celery task in dojo/notifications/helper.py; it accepts ids, re-fetches with select_related, and delegates to create_notification
  • Dispatches the five create-path notifications via dojo_dispatch_task instead of running synchronously: engagement_added, product_added, product_type_added, finding_added (API), test_added (importer)
  • Fixes POST /api/v2/engagements/ (and the four sibling endpoints) dropping from ~5s to sub-second on tenants with large authorized-user sets — the recipient enumeration query and per-user Alerts inserts now run in the worker

Why ids, not model instances

Existing send_*_notification tasks pass Django models through **kwargs, which pickles fine in eager mode but is brittle across real Celery brokers. The new task takes *_id kwargs and re-fetches, matching the user_id pattern used by the other @app.task helpers at the bottom of helper.py.

Test plan

  • unittests.test_notifications — full module passes (38 tests). Updated three TestNotificationTriggers methods with @override_settings(CELERY_TASK_ALWAYS_EAGER=True), three TestNotificationTriggersApi finding tests to patch dojo.api_v2.serializers.dojo_dispatch_task and assert finding_id=<id>, and three TestNotificationWebhooks assertions from cm.output[1]cm.output[-1] (the former index was a fixture-load side effect of the old sync path, see below).
  • New TestAsyncNotificationDispatch — verifies dispatch call shape for engagement/product/product_type signals; static guard on the importer dispatch block.
  • New TestAsyncNotificationTaskBody — verifies task rehydrates instances, returns silently on missing rows, and runs inline under block_execution=True.
  • unittests.test_importers_importer + unittests.test_apiv2_notifications — pass (40 tests).
  • JIRA VCR test test_import_no_push_to_jira — still passes; webhook round-trip for test_added is unchanged under block_execution=True.
  • Full unittests/ suite — 4,051 tests. All previously-failing cases caused by this PR now pass. Updated perf-test expected values in unittests/test_importers_performance.py (+1 async_create_notification dispatch per test_added; adjusted query counts from actual captured values) and UsersTest.deleted_objects in unittests/test_rest_framework.py (25 → 13, see below).
  • 3 pre-existing TestDojoImporterPerformanceSmall.test_import_reimport_reimport_performance_* errors remain (TypeError: _import_reimport_performance() missing 2 required positional arguments: expected_num_queries4 and expected_num_async_tasks4). These are broken on bugfix HEAD before this PR — sibling class TestDojoImporterPerformanceSmallLocations uses the correct 8-arg call and passes. Not fixed here to keep the diff scoped.

Note on deleted_objects = 25 → 13 and webhook cm.output[1] → cm.output[-1]

Fixtures load Product_Type/Product/Engagement rows before the single Notifications(pk=1) row. On the old sync path, each of those post_save signals synchronously invoked create_notificationNotificationManager()_get_notifications_object()get_or_create(user=None, product=None, template=False), which created a duplicate system notification row during fixture load because pk=1 hadn't inserted yet. Subsequent manager instantiations triggered a "Multiple system notifications objects found" WARNING that preceded the INFO the webhook tests were actually asserting on — hence the incidentally-correct cm.output[1] index. Similarly, the UsersTest delete-preview count reflected admin's ownership of that phantom duplicate row. With the dispatch now async (no user context during fixture load → sent to broker → not executed), the phantom row is never created, and the tests' expectations needed to shift to the real values.

Out of scope

  • bulk_create for Alerts rows inside AlertNotificationManger.send_alert_notification — orthogonal worker-side optimization; request latency is fixed as-is.
  • Retrofitting existing send_*_notification tasks to use ids instead of model instances — they work today and widening that surface adds risk.
  • Delete-path notifications and recipients=-based calls — not in the hot path.

🤖 Generated with Claude Code

POST /api/v2/engagements/ takes ~5s on large tenants because
create_notification runs recipient enumeration and per-user Alert
writes on the request thread. Move the outer create_notification to a
Celery worker for the five create-path events (engagement_added,
product_added, product_type_added, finding_added, test_added) by
adding async_create_notification (accepts ids, re-fetches, delegates)
and dispatching via dojo_dispatch_task. This extends the existing
per-user async pattern (Slack/email/MSTeams/webhooks) up one level so
the recipient query and Alert fan-out no longer block the response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Reformat async_create_notification importer-guard docstring to D213 style
- Skip post_save dispatch when raw=True (loaddata) so the k8s initializer's
  fixture install path doesn't require an available Celery broker. Without
  this guard the unconditional async dispatch tries to enqueue during
  product_type.json load and fails with kombu OperationalError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Maffooch Maffooch added this to the 2.57.3 milestone Apr 22, 2026
Copy link
Copy Markdown
Contributor

@mtesauro mtesauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@valentijnscholten
Copy link
Copy Markdown
Member

I don't understand how the tests can pass because the step4 expected counts are missing from the tests.

When test_id + engagement_id + product_id are all passed, the original
implementation fetched each object independently (3 queries). Since
Test.select_related("engagement__product") already loads all three in
one query, derive engagement and product from the test instead.

Same for engagement_id + product_id: one query instead of two.

Also updates the no_async performance test expected counts accordingly.
@valentijnscholten
Copy link
Copy Markdown
Member

Took the liberty to push a commit which fixes the (locally at least) broken tests. And a small optimization to not retrieve engagement/product multiple times from the database.

@Maffooch
Copy link
Copy Markdown
Contributor Author

@valentijnscholten Thank you for taking that liberty 😄

@Maffooch Maffooch merged commit bf5998f into bugfix Apr 24, 2026
159 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants